Overview

Dataset Statistics

Number of Variables 11
Number of Rows 150000
Missing Cells 33655
Missing Cells (%) 2.0%
Duplicate Rows 609
Duplicate Rows (%) 0.4%
Total Size in Memory 13.7 MB
Average Row Size in Memory 96.0 B
Variable Types
  • Categorical: 1
  • Numerical: 10

Dataset Insights

number_of_times_90_days_late and number_of_time_60_89_days_past_due_not_worse have similar distributions Similar Distribution
monthly_income has 29731 (19.82%) missing values Missing
number_of_dependents has 3924 (2.62%) missing values Missing
revolving_utilization_of_unsecured_lines is skewed Skewed
number_of_time_30_59_days_past_due_not_worse is skewed Skewed
debt_ratio is skewed Skewed
monthly_income is skewed Skewed
number_of_open_credit_lines_and_loans is skewed Skewed
number_of_times_90_days_late is skewed Skewed
number_real_estate_loans_or_lines is skewed Skewed
number_of_time_60_89_days_past_due_not_worse is skewed Skewed
number_of_dependents is skewed Skewed
serious_dlqin_2yrs has constant length 1 Constant Length
revolving_utilization_of_unsecured_lines has 10878 (7.25%) zeros Zeros
number_of_time_30_59_days_past_due_not_worse has 126018 (84.01%) zeros Zeros
number_of_times_90_days_late has 141662 (94.44%) zeros Zeros
number_real_estate_loans_or_lines has 56188 (37.46%) zeros Zeros
number_of_time_60_89_days_past_due_not_worse has 142396 (94.93%) zeros Zeros
number_of_dependents has 86902 (57.93%) zeros Zeros
  • 1
  • 2

Variables


serious_dlqin_2yrs

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 9900000
  • The largest value (0) is over 13.96 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 150000
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 13.96 times larger than the second largest value (1)
  • serious_dlqin_2yrs has words of constant length

revolving_utilization_of_unsecured_lines

numerical

Approximate Distinct Count 125728
Approximate Unique (%) 83.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2400000
Mean 6.0484
Minimum 0
Maximum 50708
Zeros 10878
Zeros (%) 7.3%
Negatives 0
Negatives (%) 0.0%
  • revolving_utilization_of_unsecured_lines is skewed right (γ1 = 97.6306)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0.02987
Median 0.1542
Q3 0.559
95-th Percentile 1
Maximum 50708
Range 50708
IQR 0.5292

Descriptive Statistics

Mean 6.0484
Standard Deviation 249.7554
Variance 62377.7452
Sum 907265.7082
Skewness 97.6306
Kurtosis 14544.2286
Coefficient of Variation 41.2925
  • revolving_utilization_of_unsecured_lines is not normally distributed (p-value 4.226560077950737e-25)
  • revolving_utilization_of_unsecured_lines has 763 outliers

age

numerical

Approximate Distinct Count 86
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2400000
Mean 52.2952
Minimum 0
Maximum 109
Zeros 1
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • age is skewed right (γ1 = 0.189)

Quantile Statistics

Minimum 0
5-th Percentile 29
Q1 41
Median 52
Q3 63
95-th Percentile 78
Maximum 109
Range 109
IQR 22

Descriptive Statistics

Mean 52.2952
Standard Deviation 14.7719
Variance 218.208
Sum 7.8443e+06
Skewness 0.189
Kurtosis -0.4947
Coefficient of Variation 0.2825
  • age has 46 outliers

number_of_time_30_59_days_past_due_not_worse

numerical

Approximate Distinct Count 16
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2400000
Mean 0.421
Minimum 0
Maximum 98
Zeros 126018
Zeros (%) 84.0%
Negatives 0
Negatives (%) 0.0%
  • number_of_time_30_59_days_past_due_not_worse is skewed right (γ1 = 22.5969)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 2
Maximum 98
Range 98
IQR 0

Descriptive Statistics

Mean 0.421
Standard Deviation 4.1928
Variance 17.5794
Sum 63155
Skewness 22.5969
Kurtosis 522.3591
Coefficient of Variation 9.9583
  • number_of_time_30_59_days_past_due_not_worse is not normally distributed (p-value 4.7209123260435e-25)
  • number_of_time_30_59_days_past_due_not_worse has 23982 outliers

debt_ratio

numerical

Approximate Distinct Count 114194
Approximate Unique (%) 76.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2400000
Mean 353.0051
Minimum 0
Maximum 329664
Zeros 4113
Zeros (%) 2.7%
Negatives 0
Negatives (%) 0.0%
  • debt_ratio is skewed right (γ1 = 95.1568)

Quantile Statistics

Minimum 0
5-th Percentile 0.004329
Q1 0.1751
Median 0.3665
Q3 0.8683
95-th Percentile 2449
Maximum 329664
Range 329664
IQR 0.6932

Descriptive Statistics

Mean 353.0051
Standard Deviation 2037.8185
Variance 4.1527e+06
Sum 5.2951e+07
Skewness 95.1568
Kurtosis 13733.831
Coefficient of Variation 5.7728
  • debt_ratio is not normally distributed (p-value 4.229555214233732e-25)
  • debt_ratio has 31311 outliers

monthly_income

numerical

Approximate Distinct Count 13594
Approximate Unique (%) 11.3%
Missing 29731
Missing (%) 19.8%
Infinite 0
Infinite (%) 0.0%
Memory Size 1924304
Mean 6670.2212
Minimum 0
Maximum 3.0088e+06
Zeros 1634
Zeros (%) 1.1%
Negatives 0
Negatives (%) 0.0%
  • monthly_income is skewed right (γ1 = 114.0389)

Quantile Statistics

Minimum 0
5-th Percentile 1300
Q1 3400
Median 5400
Q3 8249
95-th Percentile 14587.6
Maximum 3.0088e+06
Range 3.0088e+06
IQR 4849

Descriptive Statistics

Mean 6670.2212
Standard Deviation 14384.6742
Variance 2.0692e+08
Sum 8.0222e+08
Skewness 114.0389
Kurtosis 19503.8945
Coefficient of Variation 2.1566
  • monthly_income is not normally distributed (p-value 4.226960358932329e-25)
  • monthly_income has 4879 outliers

number_of_open_credit_lines_and_loans

numerical

Approximate Distinct Count 58
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2400000
Mean 8.4528
Minimum 0
Maximum 58
Zeros 1888
Zeros (%) 1.3%
Negatives 0
Negatives (%) 0.0%
  • number_of_open_credit_lines_and_loans is skewed right (γ1 = 1.2153)

Quantile Statistics

Minimum 0
5-th Percentile 2
Q1 5
Median 8
Q3 11
95-th Percentile 18
Maximum 58
Range 58
IQR 6

Descriptive Statistics

Mean 8.4528
Standard Deviation 5.146
Variance 26.4808
Sum 1.2679e+06
Skewness 1.2153
Kurtosis 3.0909
Coefficient of Variation 0.6088
  • number_of_open_credit_lines_and_loans is not normally distributed (p-value 2.6495219958400516e-09)
  • number_of_open_credit_lines_and_loans has 3980 outliers

number_of_times_90_days_late

numerical

Approximate Distinct Count 19
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2400000
Mean 0.266
Minimum 0
Maximum 98
Zeros 141662
Zeros (%) 94.4%
Negatives 0
Negatives (%) 0.0%
  • number_of_times_90_days_late is skewed right (γ1 = 23.0871)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 1
Maximum 98
Range 98
IQR 0

Descriptive Statistics

Mean 0.266
Standard Deviation 4.1693
Variance 17.3831
Sum 39896
Skewness 23.0871
Kurtosis 537.721
Coefficient of Variation 15.6756
  • number_of_times_90_days_late is not normally distributed (p-value 4.281857549057716e-25)
  • number_of_times_90_days_late has 8338 outliers

number_real_estate_loans_or_lines

numerical

Approximate Distinct Count 28
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2400000
Mean 1.0182
Minimum 0
Maximum 54
Zeros 56188
Zeros (%) 37.5%
Negatives 0
Negatives (%) 0.0%
  • number_real_estate_loans_or_lines is skewed right (γ1 = 3.4824)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 1
Q3 2
95-th Percentile 3
Maximum 54
Range 54
IQR 2

Descriptive Statistics

Mean 1.0182
Standard Deviation 1.1298
Variance 1.2764
Sum 152736
Skewness 3.4824
Kurtosis 60.4748
Coefficient of Variation 1.1095
  • number_real_estate_loans_or_lines is not normally distributed (p-value 2.2022188382642472e-23)
  • number_real_estate_loans_or_lines has 793 outliers

number_of_time_60_89_days_past_due_not_worse

numerical

Approximate Distinct Count 13
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2400000
Mean 0.2404
Minimum 0
Maximum 98
Zeros 142396
Zeros (%) 94.9%
Negatives 0
Negatives (%) 0.0%
  • number_of_time_60_89_days_past_due_not_worse is skewed right (γ1 = 23.3315)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 1
Maximum 98
Range 98
IQR 0

Descriptive Statistics

Mean 0.2404
Standard Deviation 4.1552
Variance 17.2655
Sum 36058
Skewness 23.3315
Kurtosis 545.6645
Coefficient of Variation 17.2854
  • number_of_time_60_89_days_past_due_not_worse is not normally distributed (p-value 4.24909856179237e-25)
  • number_of_time_60_89_days_past_due_not_worse has 7604 outliers

number_of_dependents

numerical

Approximate Distinct Count 13
Approximate Unique (%) 0.0%
Missing 3924
Missing (%) 2.6%
Infinite 0
Infinite (%) 0.0%
Memory Size 2337216
Mean 0.7572
Minimum 0
Maximum 20
Zeros 86902
Zeros (%) 57.9%
Negatives 0
Negatives (%) 0.0%
  • number_of_dependents is skewed right (γ1 = 1.5882)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 1
95-th Percentile 3
Maximum 20
Range 20
IQR 1

Descriptive Statistics

Mean 0.7572
Standard Deviation 1.1151
Variance 1.2434
Sum 110612
Skewness 1.5882
Kurtosis 3.0015
Coefficient of Variation 1.4726
  • number_of_dependents is not normally distributed (p-value 3.4547170911648486e-22)
  • number_of_dependents has 13336 outliers

Interactions

Correlations

Missing Values